Frequently Asked Questions

Introduction
Site details
Content
Audience participation
- What is the significance of the "be the author" rule?
- What is the significance of the "do not demand that other people say things" rule?

Introduction

What is Matthew Explains?

This site is a subscription video service dedicated to exploring research results in artificial intelligence, computer science, and other subjects.

I envision Matthew Explains as being similar in nature to a graduate seminar. Most videos will focus on recent or important scientific papers in the relevant disciplines, and usually each video will present the content of one paper as well as my own thoughts about it. My hope is that viewers will also make comments, ask questions, and have discussions of the subject matter.

I'm not going to declare formal prerequisites (nor assign graded homework), but I think the viewers who will get most out of this service will be those who are basically able to read scientific papers in general, but not necessarily experts on the specific subject matter. It's partly meant to help viewers improve their skills at reading and appreciating scientific papers.

It is not my plan to present how-to guides on specific software tools. That is already covered, or should be, by the software's own documentation and in marketing material from tool vendors. Rather, Matthew Explains is meant to be a resource for those who want to learn how the technology works, with a focus on research results. This is a perspective of science, not engineering.

What are the prices?

Free membership (with access to some content) is free of charge.

Basic membership (with access to more content), in US dollars, is $10 for a month, $100 for a year; or in Canadian dollars, $12 for a month, $120 for a year.

Pro membership (with access to all content), in US dollars, is $100 for a month, $1000 for a year; or in Canadian dollars, $120 for a month, $1200 for a year.

Sales tax applies on top of these prices for customers located in Canada, regardless of which currency they use. See "What are the differences among membership levels?" for more on which level may be appropriate for you.

Who is Dr. Matthew Skala?

Screen capture of Matthew Skala

I'm your guide, facilitator, lecturer, whatever you want to call it: I make the videos and run the site.

I received a PhD in Computer Science from the University of Waterloo in 2008, and spent a number of years working as a researcher in computer science at institutions in Canada and Denmark. You can read most of my research publications through the list on my personal site. In 2016 I left academia and since then I've been running an electronic musical instrument business.

How can we contact you for customer service and similar?

For the moment, please use the email address mskala@northcoastsynthesis.com. At some point in the future I will probably spin up email service on the matthewexplains.com domain name as well, but that isn't in place yet.

What is North Coast Synthesis Ltd.?

That is the corporate entity through which I run my business dealings. The official description of my company's business is "electronic musical instruments and consulting"; this Web site falls under the heading of "consulting." For electronic musical instruments, see northcoastsynthesis.com.

What is "Eleven Freedoms"?

I run a small Web site at elevenfreedoms.org to promote my own view of the necessary conditions for artificial intelligence to truly be free. I think the traditional definition of "free software," although still important, as applied in practice today fails to capture important freedoms relevant to machine learning and artificial intelligence in particular. We need to recognize some new freedoms - especially those that cannot be protected solely by terms in copyright licences.

Read the site for more details. To summarize, the eleven freedoms are:

0-3. the traditional Four Freedoms of free software (run the program for any purpose; study how it works; redistribute copies; distribute modifications)
4. The freedom to run the program in isolation.
5. The freedom to run the program on the hardware you own.
6. The freedom to run the program with the data it was designed for.
7. The freedom to run the program with any data you have.
8. The freedom to run the same program again.
9. The freedom from having others' goals forced on you by the program.
10. The freedom from human identity.

Site details

What are the differences among membership levels?

With a Free membership, you pay nothing, and can view some content.

With a Basic membership, you pay a low price per month, and can view more content.

With a Pro membership, you pay a higher price per month, and can view all published content.

Without making specific promises, my rough targets are for Free, Basic, and Pro accounts to have access to roughly 25%, 75%, and 100% of the site, respectively. Some of the free content may also be available without creating an account at all, so it can serve as advertising, but I'll probably also make a lot of it login-required, at least when it's new. And there'll probably be some special benefits like private discussion areas for specific membership levels.

Most items posted on the site will appear in search results and on the front page even for visitors who don't have access to actually read or watch the content - in order to provide some incentive for signing up and upgrading your account - but that's not universal. There may also be hidden items that will appear at all, only to those with a higher access level.

I'm not going to enforce specific rules about who should get which kind of account, but I see the Basic membership level as appropriate for private individuals who work in some other field but maybe have an interest in computer science, mathematics, and AI, want to learn more about these things, and are paying for it themselves; the Pro level is appropriate for practitioners who use this stuff in the context of their employment, can justify the higher subscription fee as a business expense, and want access to all the content.

When deciding what topics to cover, I will generally give more weight to what the Pro members want to see.

Why doesn't the checkout page show all options?

If you already have a current Basic subscription, then you can upgrade to Pro at any time, but you can renew your Basic subscription at the Basic level only if it will expire within the next four months.

If you already have a Pro subscription, then you can renew at the Pro level only if your subscription will expire within four months, and you can downgrade to Basic only if your subscription will expire within one month.

These rules are intended to prevent creating an extremely long subscription commitment, especially given that we offer 10:1 credit on downgrades. If someone bought a year of Pro and then immediately downgraded by buying a year of Basic, they'd end up with an 11-year prepaid subscription, which is a long time for us to commit to in such an uncertain world. There also could be a problem if (as sometimes happens) somebody bought a year's subscription, mistakenly believed the payment had not cleared, and so they attempted it again, possibly several times. Limiting renewals to only subscriptions that are nearing expiry reduces the amount of trouble such situations can cause.

How do subscription dates work?

Assuming we accept the payment at all (see previous question and answer), when you buy a month or a year, you get a new expiry date one month, or one year, after the present moment (if you started with no, or an expired, subscription) or extending from your previous expiry date (if you are renewing an unexpired subscription at the same level).

Adding one month to a date takes you to the same day-of-month in the next month. Adding one year takes you to the same day-of-month and the same month, in the next year. In either case, dates that do not exist because of short months are handled by counting an appropriate number of days into the next month. For example, if you buy a one-year membership starting on February 29 of a leap year, the expiry date will be March 1 of the following year.

If you renew an unexpired existing subscription at a different level, then your existing expiry date is first adjusted to convert the time remaining to your new membership level at a rate of 1 unit of Pro time equivalent to 10 units of Basic, and then the new time you're buying is added on. This does mean that renewing a Basic subscription at the Pro level could make your expiry date come sooner than what it was before the renewal; but you don't lose any value of remaining subscription time, given the difference in price.

The actual date calculations are done at 1-second accuracy, even though the dates are usually only displayed at 1-day accuracy.

How does the search feature work?

It's a straightforward case-insensitive substring/"phrase" search, on a slightly canonicalized version of the main text content of each entry. Type in a single word and it will return all the entries that contain that word. If you tick the "transcripts" box, then it will also search video transcripts.

This is not a very powerful type of search query, but it is, importantly, something that I can build cheaply and not fear Web robots overloading it. I don't want to ever have to generate the old PHPBB "please wait N seconds before searching" message. At some point I may also implement a more sophisticated regular-expression search, but that will require some careful analysis to make sure it is implemented safely and doesn't consume excessive resources.

How can I watch videos with a "smart TV" or similar player that requires a playlist link?

This is an experimental feature; it may not work well, and it may change. But please do try it - I'd like to get feedback so I can make it work well.

First, you must have an account, and be logged into your account. Go to your user profile page, by following the "Your profile" link in the menu with your name on it that appears at the upper right of every page.

On your user profile page, there will appear two "broadcast" links, one for RSS format, and one for M3U format. Put either of these links into your player. If one doesn't work, the other one might.

Do not share these links; they are specific to your user account.

Content

What is the tagging scheme for entries on this site?

Tags are free-form and may not always be used consistently, but here is an outline of tags and kinds of tags I intend to use:

General category of entry or research: alignment applications basics meta model-intro prompting security theory training
Modalities: audio code image text video
Specific models and groups: AIAYN BERT DeepSeek GPT LLaMA Mistral Qwen Note I won't always tag every model mentioned in every paper, only those that are significantly used enough to be topics of the papers, on which readers might wish to search.
Specific technologies and subjects: GAN LoRA MoE RAG attention diffusion hallucination sampling tokenization toxicity

Can we use the "research ideas" from Matthew Explains videos?

In most videos I present "research ideas": interesting questions arising from the content of the paper I'm discussing. These are things I might pursue myself if I had the time and resources to do so. Since, in general, I don't, they are basically available for whoever wants to use them.

However, you should be aware that the research ideas I list are quite often only what any reasonable reader would think of. I cannot promise that ideas I list are actually novel open problems. Especially in relation to older papers, it's quite possible that others have already studied the questions I thought of, and published answers for them, before I ever thought of or posted the questions. So you still need to do a proper literature search of your own.

My hope is that we can have interesting discussions of these ideas in the comment sections of this Web site. If members here want to work on the research ideas and collaborate formally or informally with other members to do new research work, all the better. As I say in the "what is" question above, I hope this site can function much like a graduate seminar.

If you do end up publishing academic work based on one of the listed research ideas, please cite the video in question, treating it like a Web resource or corporate whitepaper published by North Coast Synthesis Ltd. Don't just put my name in the acknowledgments paragraph. An appropriate citation form might look something like this:

Skala, Matthew. 2025. Dog-whistle GANs. In Matthew Explains. North Coast Synthesis Ltd. Online https://matthewexplains.com/11172644/.

If you're publishing in a professional venue then it's expected that you will know how to modify that appropriately for your venue's style rules, but it'd certainly be nice if you can get in both "Matthew Explains" and "North Coast Synthesis Ltd." as well as the URL for the specific posting, even if it is a posting limited to paying members. Note that you'd still cite a book even if would-be readers have to buy it.

If you want me to be involved in your research in a bigger way, such that I'd actually become a co-author of a paper or make a similar level of contribution to a non-academic project, then we can talk about that; but I'm no longer in a situation where I can do serious computer science research without being paid for it. As of this writing I am struggling to meet basic living expenses (food and rent). I might still have some limited bandwidth for free-software collaboration when everyone involved is working on the same basis, but if I'm a real participant in a project where you're making a profit or getting a pay cheque, then I should, too.

If you want to use ideas proposed by other members on this site, then that's between you and them, save that I expect you all to handle it reasonably.

Things published in my videos, or in the comment sections of this Web site, are probably not eligible for patent protection.

How do you prepare transcripts?

This may change over time as the technology advances, but at the moment, I start with the raw audio track from each video. That is mono 16-bit PCM with 48kHz sampling rate, and it already includes some dynamic-range compression. I do further compression, fairly aggressively, with the sox "compand" filter:

sox in.wav compressed.wav compand 0.3,1 6:-70,-60,-20 -5 -90
0.2

Those parameters are one of the example sets from the sox documentation. The point of doing the dynamic compression is that the transcription models seem to have trouble with variations in input level: if there are louder and quieter parts in the input, then the models are inclined to just skip over the quieter parts, apparently treating them as background interference instead of words that should actually be transcribed. So the dynamic compression tries to force every word to be near maximum volume, and be more likely to make it into the output.

Then I generate a rough transcript with whisper.cpp and the Whisper Large V2 model, on its default settings. Note that is not the latest and greatest Whisper model; in my experiments, the V3 and "V3 Turbo" models, although supposedly newer and better, are much too prone to skip sentences and insert extra words. I think they may be tuned for recognizing short commands in the presence of interfering background voices, so they ignore parts of a relatively clean recording that doesn't contain background voices, because the training includes an assumption that they at least ought to be skipping something. On the default settings, whisper.cpp generates a timed transcript, and although I'm not certain, I think just having timestamps turned on may help it avoid throwing massive amounts of repetition into short silences.

Finally, and as a very important step, I edit the transcript by hand. I use regex search in my editor to remove the time stamps; choose paragraph breaks and reformat; and go through the entire thing while listening to the video soundtrack (sped up a bit faster than real time) to make sure nothing important was skipped over or hallucinated. I also normalize spelling - in particular, for my own name, which the models always seem to want to spell with a C.

So far I haven't found a fully automated transcription pipeline that I would trust, but this one does pretty well at keeping the amount of human labour low.

Audience participation

What is the significance of the "be the author" rule?

This is primarily an attempt at formulating a rule against abuse of generative language models in the discussion areas of the site, while accommodating the typical subject matter of this Web site (which would make a flat no-model-output rule unworkable). We talk a lot about language generation here. It's to be expected that you will be running models and talking about the results you obtain and even quoting from the output you get; so I don't want to forbid that. At the same time, it is a problem if users waste my, and each other's, time with postings of "slop."

So the rule is: you can use models to do things like spelling and grammar checking. There is no magic line between a spelling or grammar checker that happens to use a neural network and one that doesn't. You can type a proposed posting into a chatbot and ask it to simulate having an "opinion," much as you might do with a human editor. You can do experiments, and talk about your results from running models, and even quote from the output you obtained in the context of a posting you are writing yourself - much as you can quote from a human author when appropriate in the context of your own writing. But what you must not do is anything that makes you not really the author of what you're posting.

Do not set up a model so that it pretends to be a discussion-thread participant and posts automatically. Do not reply to a human's question with "I asked ChatGPT and here's what it said: [2000 tokens of slop]." Don't even do that if it's your own question - do not originate a thread with "I was wondering about such-and-such topic, so I asked ChatGPT and this is what it said: [2000 tokens of slop], but the pizza sticks to my mouth and fingers, what's wrong?" In each of these cases you are not really being the author of what you pressed the button to post, and you're wasting the time of the other humans in the discussion by asking them to evaluate the model's output.

If to support a discussion here you want to make available a significant amount of data for which you are not the author - whether it's a lengthy quote of model output or anything else - then I'd suggest posting it on the Web somewhere, and mentioning the URL in your comment here. That way it'll be available to interested readers without imposing on everybody in the discussion thread.

As mentioned in another rule, factual accuracy is not an excuse for posting something you shouldn't. Even if a chatbot happened to generate a factually correct answer to a question, posting a chatbot answer as a substitute for being the author yourself, is against the rule. The issue isn't whether the chatbot's writing is true, but whether posting it is an appropriate use of shared resources.

What is the significance of the "do not demand that other people say things" rule?

There's a specific pattern that sometimes comes up in online discussions - not really often, but often enough that I think it's worth banning - where someone tries to bait someone else into saying specific words and then attempts an argument from silence if the bait isn't taken up, usually becoming increasingly hysterical as the discussion continues not to go their way:

Come on, admit that President Harding was a scumbag! Why are you afraid to say so? It's very simple, just type: "Warren Harding was a scumbag." I'm waiting! If you're afraid to admit that Warren Harding was a scumbag it must be because you're in the pocket of Big Oil, taking payoffs for Teapot Dome!

Real cases are usually even stupider than the above fictional example. Of course, it's easy to observe that this is also a violation of the "do not bring in an outside agenda" rule. Bearing in mind the rule that factual accuracy does not excuse misbehaviour, someone who pulled the above routine and was warned to stop, would not be well-advised to respond by attempting to introduce evidence that Harding really was a scumbag.

So, one application of the rule is: never do that.

Another application has to do with deferential forms of address. In particular, many people likely to participate here happen to have PhD degrees. I have one myself; I know exactly how much it's worth and what makes it important. And I do introduce myself as "Dr. Matthew Skala" sometimes, especially in some marketing contexts where I think it may help establish credibility. But it'd be tacky, and beneath the dignity of the degree itself, were I to demand that everybody should call me "Dr. Skala" every time, in every context. You shouldn't make such demands here either.

The general principle here is that what other people say is their choice, not yours. You don't get to put words in anybody's mouth.

Matthew Explains

North Coast Synthesis Ltd.