Wikipedia talk:Untagged images/Archive 1
This is an archive of past discussions about Wikipedia:Untagged images. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 |
There are lists of untagged images at Wikipedia:Untagged Images. These need to be tagged and tagged ones need to be removed from the lists. All images that are not tagged will not be included in the planned Mandrakelinux distribution (see m:Wikimedia and Mandrakesoft). Please see Wikipedia:Image copyright tags for details of how to tag images. Based on a sample of 277 untagged images, at least 1 in 5 should have been tagged GFDL. Therefore, there are 10,000 GFDL images that won't be distributed unless they are tagged.
You can tag an image even if you don't know where it came from because the uploader has often written something on the page like "my own photo" or GFDL, but hasn't added the GFDL tag. If it's their own photo and they clicked the box to say "I affirm that the copyright holder of this file agrees to license it under the terms of the Wikipedia copyright", you can assume it's GFDL. Other images have a link to a source. If this source is the work of the U.S. federal government, you can add {{PD-USGov}}. If it's a logo or album cover, you can generally assume it's {{fairuse}}. If the date is on the image description page, and it is before 1923, you can add {{PD-US}}. A lot of the time the image description page tells you what the licence is, but doesn't use a tag. To allow automatic filtering, we need the tags. If you're really keen, you can also drop a few notes on user_talk pages to find locations for specific - remember there will be a lot of people around who uploaded images before tags existed, and have forgotten that they uploaded those images - and thus haven't gone back to tag them. User_talk messages might help jog the memory.
Please be careful about liberally slapping fair use notices on things.
The deadline for this is currently the end of October 2004. However, there will also be later releases, so work done after that deadline will still be useful.
- What I would find extremely useful is a list of untagged images by who uploaded them. People could quickly find and deal with their own images while those from known copyright violators could be more easily found and deleted. - SimonP 22:56, Sep 6, 2004 (UTC)
Money images?
What would images of money, such as Image:1000TRLira.jpg, be tagged as? --Sparky the Seventh Chaos 00:08, Dec 9, 2004 (UTC)
- user:Grinner has been adding
{{PD}}
to a bunch of these without much by the way of justification:
I suspect there might be a little more work required on those, so I've pulled them into separate section. --Phil | Talk 13:23, Dec 9, 2004 (UTC)Money image, seems like Pub. Dom. to me
- U.S. currency is Public Domain, as U.S. law doesn't allow government-created works to be copyrighted. The currency of other countries is probably public domain, but I don't know for sure. – Quadell (talk) (help)[[]] 21:49, Dec 9, 2004 (UTC)
I added a {{money}} tag. Right now, it says money designs are public domain. If that turns out to be incorrect, I can just change the tag. For now, you can tag all official money images with the {{money}} tag. (Note that if it's a big of a hand holding a dollar bill, this doesn't count. It has to be just a picture of the money design.) – Quadell (talk) (help)[[]] 23:53, Dec 9, 2004 (UTC)
- Euro banknote design is clearly copyright (the notice on the notes kind of gives it away!). As such, taking photos of euro notes can presumably be classed as fair use, but certainly images of only the design are not public domain, and even photos of the notes should acknowledge the design copyright.
- While we clearly should be able to use photos of euro notes on Wikipedia, there is no way we can apply the {{money}} tag to them.
- zoney ♣ talk 17:07, 10 Dec 2004 (UTC)
Discussion contiued at Wikipedia talk:Image copyright tags#Pictures of money. – Quadell (talk) (help)[[]] 20:43, Dec 10, 2004 (UTC)
Move out of user space?
There is a taboo about changing pages in the User: space. I wonder if it is inhibiting progress on this project to some degree. Is a move to the Wikipedia: space merited? --Ben Brockert 05:08, Dec 9, 2004 (UTC)
- Taboo? Hah, there is no such taboo, just be bold and get started! Although I do feel this could go into Wikipedia namespace, if we hurry up we will not need it :) -- [[User:Solitude|Solitude\talk]] 09:04, Dec 9, 2004 (UTC)
It's out of user space now. But you probably noticed that already. :)
Should be finished by Xmas
Looking at the current trend we could be finished this by Christmas - a nice present to the Wikipedia community. Of course there are now 3600 images in the Unverified category .... Evil Monkey → Talk 05:31, Dec 14, 2004 (UTC)
- That'll be the next project. But let's not get ahead of ourselves. :) – Quadell (talk) (help)[[]] 14:18, Dec 14, 2004 (UTC)
- Hmm. That might be a bit ambitious. But ambition is good. I'm off work next week and should find time to knock of quite a bit. --Kbh3rd 03:40, 17 Dec 2004 (UTC)
- Well it looks like we missed my prediction. But the trend is good. Currrently we are tagging an image a minute. However having a browse through the unverified category it seems to me that people are just slapping an unverified tag on images that with about five minutes googling a copyright can be found.Evil Monkey → Talk 04:00, Dec 27, 2004 (UTC)
NASA flight patches
I haven't come across these as of yet but have just remembered something about them. On http://spaceflight.nasa.gov it states that:
The NASA insignia design for Space Shuttle flights is reserved for use by the astronauts and for other official use as the NASA Administrator may authorize. Public availability has been approved only in the form of illustrations by the various news media. When and if there is any change in this policy, which we do not anticipate, it will be publicly announced.
What does that mean for Wikipedia. I know that images and basically everything else from NASA are PD but what about the NASA emblem and these flight patches. Evil Monkey → Talk 21:42, Dec 19, 2004 (UTC)
- Those restrictions aren't copyright issues, I believe. They are separate restrictions to prevent a private group from donning the NASA emblems to pass themselves off as NASA employees. So they aren't copyrighted, but there are restrictions on how they can be used. I would say the {{PD-USGov-NASA}} tag is still accurate. – Quadell (talk) (help)[[]] 14:18, Dec 20, 2004 (UTC)
Additional SQL help required
Has anyone answered the call for regenerating the list from a copy of the database? It's something I might be able to do so long as I can grok the schema and have enough disk/memory space to work with it. But don't let me stop someone else who feels they have the resource and skills to do this. Kbh3rd 17:04, 12 Jan 2005 (UTC)
- I don't think anyone has volunteered yet. It would be very helpful. – Quadell (talk) (help) 21:49, Jan 12, 2005 (UTC)
Just out of curiosity, aren't the original SQL statements available as a start? Or has the schema changed so drastically since then? RedWolf 06:18, Jan 13, 2005 (UTC)
- I'm not even really sure who ran the original SQL queries. Does anyone know? – Quadell (talk) (help) 21:42, Jan 13, 2005 (UTC)
- I asked User:Yann, who replied: "I asked this list to be created but I didn't participate much to tagging since I mainly work on fr: and other projects and languages. I moved the page to Wikipedia:Untagged Images. If an update is needed, it should be asked to Looxix who has the status to do that. Yann 18:08, 9 Dec 2004 (UTC)", and so I asked User:Looxix but got no reply. So I think we probably have to start from scratch. --Tagishsimon (talk) 22:14, 18 Jan 2005 (UTC)
Re-generating the list
I have spent a bit of time seeing if I can re-generate a new list. What I have done so far is download the latest database dump (Jan 7/05), imported it into a local mysql database (~2 hours) and then ran the following SQL:
tee untagged.txt;
select cur_title from cur where cur_namespace=6 and
locate('{{',cur_text) = 0 and locate('}}',cur_text) = 0;
The result from mysql: 19903 rows in set (1 min 40.96 sec)
Using "vi", I converted the output to wiki link list format and then split the file into 1,000 line chunks. I have uploaded the first chunk to User:RedWolf/untaggedImages-xaa for now. Keep in mind that the latest database dump is two weeks behind so the list would contain a lot of images that have been tagged since then. This is my first cut at the sql and it is probably not totally correct but at least it's a start. Note that Images is namespace 6 in the cur table. RedWolf 19:49, Jan 21, 2005 (UTC)
- Awesome. Already removed one that had been tagged. Evil Monkey → Talk 20:44, Jan 21, 2005 (UTC)
- Yes, that look fantastic RedWolf. (Fantastic, except for the fact that there are now 20,000 more images to tag.) Since we're so close to being done with the old batch, and since so many images are being tagged right now, I'd suggest you get the first dump you can get after the current batch is done. Then post the pages from here as the new batch. – Quadell (talk) (help) 20:58, Jan 21, 2005 (UTC)
- Considering there were about 9,000 untagged images left from the "old" list as of January 7, one might hope the actual number would drop to about 12,000 (although that in itself is still a rather large number to be tagged). Once a new database dump is available after the current list is done (which appears to be less than a day now), I'll re-generate a list from it and we can decide how to get it setup for round 2. RedWolf 21:59, Jan 21, 2005 (UTC)
- One question is when will we start on the unverified. This currently has probably 10000 images (I can't tell now due to the way images are displayed in categories). And a cursory glance through shows that a lot of these can be tagged properly either because they have enough info in the description or after a quick google search for the source. Evil Monkey → Talk 22:12, Jan 21, 2005 (UTC)
- I think that could be another project on its own. Some policies need to be created on how to deal with all the unverified images, especially those from absent wikipedians. RedWolf 02:22, Jan 22, 2005 (UTC)
- Tip of the hat to RedWolf for SQLing; thank you. Can I remind you of a posting from a way back made at the top of this page, by SimonP, and reprinted in italic below. I can see pros & cons in his idea, but though it worth drawing attention to it new that new lists are contemplated --~
- What I would find extremely useful is a list of untagged images by who uploaded them. People could quickly find and deal with their own images while those from known copyright violators could be more easily found and deleted. - SimonP 22:56, Sep 6, 2004 (UTC)
- Wow. Yes, that would be very useful. – Quadell (talk) (help) 14:10, Jan 22, 2005 (UTC)
- Yes, I have seen SimonP's request but the first problem is getting the user table. The regular database dump only contains the "cur" table which only has a user id, so I need the user table for mapping to a name. I have not been able to locate the user table as yet. RedWolf 19:16, Jan 22, 2005 (UTC)
- if it is not available, a list of images grouped by user_id would allow us to drill into an image file to find the user; small hardship. If you'd be willing, I think some static pages of links to images by type of tag would assist: it's reasonable to suppose that we want to go through the images to QA the tagging that has been done. Using the category browse - cf. [Category:GFDL images] - is very slow presumably because of the query being executed. To be honest, a whole set of static pages /with/ thumbnail images a la [Category:GFDL images] might be best ... east to spot mis-tagged images such as coins, flags &c... but that'd be to take things to the point where you'd need a bot to upload that many new pages. --Tagishsimon (talk)
- Another thing would be to generate hypotheses about the likely tag of the image. What I mean is this: often an image description page will say "GFDL" or "Wikipedia licen[sc]e" or "{taken,drawn,created} by {me,myself}", but not "{{GFDL}}". Those could be put into a list of potential GFDL candidates. Similarly for "PD" or "fair use" on the image description page, or with "coa" or "flag" in the file name. That way I'd have to keep essentially just one set of criteria in mind and one tag in the clipboard when I go through a list of possible GFDL or PD candidates. Yet another thing would be to special-case certain contributors who generally license all of their images under a free license or place them into the public domain, like for example Arpingstone does for his own pictures. --MarkSweep 15:39, 22 Jan 2005 (UTC)
- I've had an initial try at that, and it looks very promising. See User:Kbh3rd/Tagging queries for a sample query and truncated results. That query could be improved, and I can think of lots of other potentially useful approaches.
- Uh, do we want to open a new page for discussing phase two of the project? This subthread on this page is getting rather unwieldy.
- -- Kbh3rd
Dream
This project was in my dream last night...a large room with fifty or so professional image taggers, and a counter at the front of the number of images remaining untagged. It was down to 200 at that time and everyone was getting pretty exited... Does this get me wikiholic points? Zeimusu 02:41, 2005 Jan 13 (UTC)
Wow, we are now at that point, Zeimusu. Is it just like you dreamed? (I have to admit: with about 300 left to tag, I'm feeling mighty excited. But don't tell any of my non-Wiki friends. They wouldn't understand.) – Quadell (talk) (help) 21:00, Jan 21, 2005 (UTC)
Zero
We are at zero. And its just like my dream! Time to open up the coffee house. Zeimusu 01:20, 2005 Jan 22 (UTC)
- Is there some reason not to trumpet the completion of round 1? The list
is now empty!! Stan 04:19, 22 Jan 2005 (UTC)
Image Statistics
For some statistics on images in the database dump of January 7, 2005, see User:RedWolf/Image Statistics. Once I can get a db dump after today, I can update the stats to better reflect the completion of round 1. RedWolf 02:26, Jan 22, 2005 (UTC)
Woo-hoo!
It's done! Cigars for everybody! Drinks on the house! I love each and every one of you buggers! – Quadell (talk) (help) 04:40, Jan 22, 2005 (UTC)
- I don't mean to rain on your parade, but there is still work to do. For example, Category:Images with unknown source contains plenty of images that could have received proper tags. Fortunately, it's much easier to go through this now in gallery mode, which makes it easy to spot album covers, crests and coats of arms, logos, etc. --MarkSweep 06:01, 22 Jan 2005 (UTC)
- You say that is if it was a bad thing. Since this project's initial list reached zero, I hardly know what to do with myself. Bring 'em on! -- Kbh3rd 15:53, 22 Jan 2005 (UTC)
Congratulations to everyone who helped with this -- an amazing effort! Thank you very much for doing this vital work. Catherine\talk 07:27, 29 Jan 2005 (UTC)